What is an outlier? What visualization helps in discovering outliers w.r.t. one attribute of the interval type?, two such attributes? Do you know a way to detect outliers in a higher-dimensional space?
What is regression?, correlation? Which one is closer to supervised classification?, to frequent itemset mining? Can you give reasons why "correlation does not imply causation"?
What type of dependence between the attributes is assumed by the Pearson correlation coefficient? Is that measure robust? Can you name a correlation measure that applies to ordinal attributes?
What is Simpson's paradox? Can you give an example? Bonus point to those who give a real-world example not in the slides (easy to find on the Web).
What is the difference between a supervised and an unsupervised task? Can you give examples of such tasks? What does it entail for their quality assessment?
What is dimensionality reduction? Different algorithms reduce the number of dimensions for different purposes. Which purposes?
Can you give two reasons to do a Principle Component Analysis? What does the first component maximize?
